Automatic Dataset Augmentation

نویسندگان

  • Yalong Bai
  • Kuiyuan Yang
  • Wei-Ying Ma
  • Tiejun Zhao
چکیده

Large scale image dataset and deep convolutional neural network (DCNN) are two primary driving forces for the rapid progress made in generic object recognition tasks in recent years. While lots of network architectures have been continuously designed to pursue lower error rates, few efforts are devoted to enlarge existing datasets due to high labeling cost and unfair comparison issues. In this paper, we aim to achieve lower error rate by augmenting existing datasets in an automatic manner. Our method leverages both Web and DCNN, where Web provides massive images with rich contextual information, and DCNN replaces human to automatically label images under guidance of Web contextual information. Experiments show our method can automatically scale up existing datasets significantly from billions web pages with high accuracy, and significantly improve the performance on object recognition tasks by using the automatically augmented datasets, which demonstrates that more supervisory information has been automatically gathered from the Web. Both the dataset and models trained on the dataset are made publicly available.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sentiment Analysis on Financial News Headlines using Training Dataset Augmentation

This paper discusses the approach taken by the UWaterloo team to arrive at a solution for the Fine-Grained Sentiment Analysis problem posed by Task 5 of SemEval 2017. The paper describes the document vectorization and sentiment score prediction techniques used, as well as the design and implementation decisions taken while building the system for this task. The system uses text vectorization mo...

متن کامل

Automatic segmentation of glioma tumors from BraTS 2018 challenge dataset using a 2D U-Net network

Background: Glioma is the most common primary brain tumor, and early detection of tumors is important in the treatment planning for the patient. The precise segmentation of the tumor and intratumoral areas on the MRI by a radiologist is the first step in the diagnosis, which, in addition to the consuming time, can also receive different diagnoses from different physicians. The aim of this study...

متن کامل

Chest X-Ray Analysis of Tuberculosis by Deep Learning with Segmentation and Augmentation

The results of chest X-ray (CXR) analysis of 2D images to get the statistically reliable predictions (availability of tuberculosis) by computer-aided diagnosis (CADx) on the basis of deep learning are presented. They demonstrate the efficiency of lung segmentation, lossless and lossy data augmentation for CADx of tuberculosis by deep convolutional neural network (CNN) applied to the small and n...

متن کامل

A Robust Real-Time Automatic License Plate Recognition based on the YOLO Detector

Automatic License Plate Recognition (ALPR) has been a frequent topic of research due to many practical applications. However, many of the current solutions are still not robust in real-world situations, commonly depending on many constraints. This paper presents a robust and efficient ALPR system based on the state-of-the-art YOLO object detector. The Convolutional Neural Networks (CNNs) are tr...

متن کامل

AHyDA: Automatic Hypernym Detection with Feature Augmentation

English. Several unsupervised methods for hypernym detection have been investigated in distributional semantics. Here we present a new approach based on a smoothed version of the distributional inclusion hypothesis. The new method is able to improve hypernym detection after testing on the BLESS dataset. Italiano. Sulla base dei metodi non supervisionati presenti in letteratura, affrontiamo il t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1708.08201  شماره 

صفحات  -

تاریخ انتشار 2017